Associative memory

ABSTRACT

The associative memory comprises a simplified functional processing unit (SFPU), implemented by an LUT logic network, that implements simplified CAM function g, where g is the function derived from CAM function ƒ by replacing the value showing “invalid” with the don&#39;t care, an auxiliary memory that stores the inverse function ƒ −1  of said CAM function ƒ; and an output modifier that checks whether the output value of said SFPU is equal to the output value of the CAM function ƒ; wherein the SFPU produces the operational value (“tentative index value”) for the simplified CAM function g; the auxiliary memory produces the value of the inverse function ƒ −1  when the tentative index value is applied; the output modifier compares the input data with the value of the inverse function ƒ −1 , and produces the output of said SFPU if they are the same, otherwise produces the signal showing the “invalid”.

TECHNICAL FIELD

This invention is concerned with an associative memory (ContentAddressable Memory: denoted by “CAM”), especially with an associativememory performing high speed search, dissipating low power, andrequiring a small area.

BACKGROUND ART

For a given index (address), an ordinary memory generates the registereddata stored in that address. On the other hand, a CAM generates theindex (address) of a given search (input) data, (See non-patentliterature 1, 2) stored in the CAM.

CAMs are used in wide area such as pattern matching, routers forinternet, cache for processor, TLB (Translation Lookaside Buffer), datacompression, accelerator for a database, neural network, and memorypatch.

From their functions, CAMs are usually classified into two types: Thefirst one is a Binary CAM (“BCAM”), and the second one is a ternary CAM(“TCAM”). In the BCAM, each cell stores either 0 or 1. On the otherhand, in the TCAM, each cell stores either 0, 1, or *. where, ‘*’denotes a “don't care”, which matches both 0 and 1.

[Definition 1] (BCAM)

An n-input BCAM table with p registered vectors stores, p differenttwo-valued vectors. We assume that p vectors are stored in the BCAM fromthe address 1 to the address p, in order. Note that an address of eachvector can be represented by m bits. m is given by (1).

[Expression 1]m=┌log₂(p+1)┐  (1)The corresponding BCAM function ƒ: {0, 1}^(n)→{0, 1}^(m) satisfies thefollowing condition:

For the given input x, if the same vector exist in the BCAM table, thenƒ(x) produces the CAM address (from 1 to p) that stores the vector x. Ifno vector in the BCAM table matches to the input x, then the value ofƒ(x) is 0.

(End of Definition)

EXAMPLE 1

(Table 1) shows the BCAM storing 7 two-valued vectors. The correspondingBCAM function is shown in Table 2. In both cases, they produce theaddress that stores the vector exactly matching to the input data by a3-bit number (e.g., ‘011’). When no vector in the BCAM matches to theinput vector, the BCAM produces ‘0’.

(End of Example)

TABLE 1 Example of a BCAM table. address vector 1 0010 2 0111 3 1101 40101 5 0011 6 1011 7 0001

TABLE 2 Example of a BCAM function. x₁ x₂ x₃ x₄ f₂ f₁ f₀ 0 0 0 0 0 0 0 00 0 1 1 1 1 0 0 1 0 0 0 1 0 0 1 1 1 0 1 0 1 0 0 0 0 0 0 1 0 1 1 0 0 0 11 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 11 1 1 0 1 1 0 0 0 0 0 1 1 0 1 0 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0

[Definition 2] (TCAM)

An n input TCAM table with p products stores p ternary vectors. Weassume that the TCAM stores p vectors from the address 1 to the addressp, in order. Note that the address of each vector can be represented bym bits, where m is given by the above-mentioned equation (1). Eachternary vector consists of 0, 1, or * (don't care). The correspondingTCAM function: ƒ: {0, 1}^(n)→{0, 1}^(m) satisfies the followingconditions:

When the TCAM table has a vector that matches to the input x, the outputƒ(x) denotes the minimum address of the vector matching to the inputvector. If no vector in the TCAM table match to the input vector x, ƒ(x)produces 0.

(End of Definition)

EXAMPLE 2

The TCAM shown in (Table 3) stores 7 ternary vectors. The correspondingTCAM function is shown in Table 4. Note that the input x=(1, 0, 1, 1)matches to the patterns stored in address 5 and 6. Since, 5 is smaller,the output of TCAM is (0, 1, 0, 1).

(End of Example)

TABLE 3 An example of TCAM table. address vector 1 *010 2 0011 3 1101 41100 5 *011 6 1*11 7 *001

TABLE 4 An example of TCAM function. x₁ x₂ x₃ x₄ f₂ f₁ f₀ 0 0 0 0 0 0 00 0 0 1 1 1 1 0 0 1 0 0 0 1 0 0 1 1 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 01 1 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 1 0 1 0 0 1 1 1 1 1 0 1 0 0 0 1 1 01 1 1 0 1 1 1 0 0 1 0 0 1 1 0 1 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 0

The CAM function can be implemented by software, however, the softwareimplementation is quite slow. Thus, in many cases, a CAM is implementedby a special hardware (semiconductor memory). In the following, weexplain a conventional CAM implemented by hardware.

FIG. 8 is a block diagram showing an example of a basic realization of aconventional CAM (see Patent literature 1). CAM 100 consists of acomparison register 101, search bit line drivers 102, n words W₁˜W_(n),n match sense circuits MSC₁˜MSC_(n), n match flag registersMFR₁˜MFR_(n), and the priority encoder PE.

The comparison register 101 stores the m-bit search data. The search bitline driver 102 drives each bit of the comparison register 101 along thesearch bit line. Each of the words W₁˜W_(n) has m-bit CAM cells.

FIG. 9 is a circuit diagram of the CAM cell in FIG. 8. The CAM cell 103illustrated in FIG. 9 detects the mismatch. The CAM cell 103 consists ofa memory cell 104, and a comparison circuit 105. The memory cell 104 isa memory cell of SRAM storing 1-bit data. In FIG. 9, D denotes the data,and DN denotes the complement of the data. The comparator 105 comparesthe data stored in the memory cell 104, with the search data which isdriven on the pairs of search bit lines SL and SLN, and producecomparison results to the match line ML.

The comparator 105 consists of three MOS transistors (“nMOS”) 106, 107,108. Two nMOS 106 and 107 are connected in series, and they are locatedbetween the search bit line SLN and the search bit line SL. The gates ofnMOS 106 and 107 are connected to the data D input and the complement ofthe data DN input of the memory cells 104, respectively. nMOS 108connects the match line ML and the grand. The gate of nMOS 108 isconnected to the node 109 that is between nMOS 106 and 107.

First, before doing a search, data to be searched are stored in words inW₁˜W_(n) of CAM 100. In each CAM cell 103 of a word, the write operationof data to the memory cell 104, and the read operation of data from thememory cell 104, are done in a similar way to an ordinary SRAM.

To do the search operation, first, search data are stored in thecomparison register 101. Each bit of the search data, is sent to thecorresponding search bit line through the search bit line driver 102.

In each of words W₁˜W_(n), matching operations are done to the datastored in CAM cells and the search data sent to the search bit lines,simultaneously, and the results are sent to the match lines ML₁˜ML_(n).These search results are sent to the match sense circuit MSC₁˜MSC_(n).Each of the match sense circuits MSC₁˜MSC_(n), amplifies the searchresult, sent to the match flag output lines MF₁˜MF_(n), and sent to thematch lines MT₁˜MT_(n), as the match sense output.

Each of the match sense output are stored in the match flag registerMFR₁˜MFR_(n), and sent to the match flag lines MF₁˜MF_(n) as the matchflag output. Note that the match flag ‘1’ denotes “match”, while ‘0’denotes “mismatch”.

Each output of the match flag is sent to the priority encoder PE. Thepriority encoder PE, according to a predetermined priority, selects theaddress of the word with the highest priority (i.e., the highestpriority match address:HMA) among the matched words, and produces theoutput. As for the priority of words, we assume that the word W₁ has thehighest priority, and the priority will be lower as the index increases,and W_(n), has the lowest priority.

Note that the matching search in the each CAM cell 103 in words W₁˜W_(n)is implemented as follows:

First, the initialization operation is performed. In the initializationoperation, pairs of the search bit lines SL and SLN are set to both‘L’(=‘0’) state. On the other hand, according to the data stored in thememory cell 104, one of nMOS 106 or 107 of the comparison circuit 105 isin the ON state, and the other nMOS is in the OFF state. Therefore,through the ON transistor, either nMOS 106 or 107, the voltage level ofthe node 109 that is located between nMOS 106 and 107, will be ‘L’, andthe nMOS 108 will be the OFF state. In this state, the match lines MLare precharged to the ‘H’(=‘1’) state. Note that in the match line ML,‘H’ denotes the “match”.

Next, through the search bit line, each bit of the search data stored inthe comparison register 101 is sent to each CAM cell 103. With thisoperation, according to the search data S, either one of the search bitline pair SL, SLN will be in the ‘H’ state, and the other will be in the‘L’ state.

When the data D that is stored in the memory cell 104 match to thesearch data S, the level of the node 109 will be ‘L’, and the nMOS 108is kept to be in the OFF state.

On the other hand, when the data D is different from the search data S,the level of the node 109 will be ‘H’, and nMOS 108 will be in the ONstate. From this, the match line ML is discharged, and it will be in the‘L’ state.

The match line ML of a CAM word consisting of m-bit CAM cells 103, formsa wired OR circuit, where nMOS 108 of CAM cells 103 are connected inparallel.

Thus, only when matches are detected in all the m bit CAM cells 103 thatrealize a word, the match line ML is kept in the ‘H’ (match) state. Onthe other hand, if a mismatch is detected in any of the CAM cells 103,the match line ML becomes to the ‘L’ (“mismatch”) state.

For example, as a result of a search, assume that ‘0’, ‘1’, ‘1’, ‘0 ’, .. . , ‘1’, ‘0’ are stored in the match flag register MFR₁˜MFR_(n). Inthis case, matches are detected in the words W₂, W₃, . . . , W_(n−1).Thus, the priority encoder PE, produces the address of the highestpriority word W₂ as the HMA. Also, by clearing the match flags to ‘0’stored in the match flag register MFR₂, the circuit can produce theaddress of word W₃, the next highest priority address as the HMA. In asimilar way, the circuit can produce the address of the word where thematch is detected.

By the way, when the circuit is used as a TCAM, in the bit showing don'tcare, the pair of search bit lines SL and SLN must be set in the‘L’(=‘0’) state.

FIG. 10 is the circuit diagram of a realization of other example that isdifferent from the CAM cell shown in FIG. 8. The CAM cell 103′ shown inFIG. 10 is “the match detection type”. Similar to FIG. 9, it has thememory cells 104 of the SRAM realization, and the comparison circuit105. The CAM cell 103′ has the different connection in nMOS 108 of thecomparison circuit 105, in the CAM cell 103 in FIG. 9. The nMOS 108 inFIG. 10, is between the match line ML_(a) and the match line ML_(b) toconnect them. The gate of nMOS 108 is connected to the node 109 which islocated between nMOS 106 and 107.

In CAM cell 103′, when the search operation is performed, as theinitialization operation, both of the bit line pair SL and SLN are setto ‘H’. On the other hand, according to the data stored in the memorycell 104, either one of nMOS 106 or 107 of the comparison circuit 105will be in the ON state, and the other nMOS will be in the OFF state.Thus, the through the transistor nMOS 106 or 107 having the ON state,the level of the node 109 will be ‘H’, and the state of nMOS 108 will beON. In this state, a terminal of the match line ML is precharged to‘H’(=‘1’). Note that in the match line ML, ‘H’ denotes “mismatch”.

In the match line ML of the CAM word consisting of m bit CAM cell 103′,the nMOS 108 of CAM cells 103′ are connected in series to form the ANDnetwork. Thus, the match lines ML_(a), ML_(b) are precharged to ‘H’through the nMOS 108 of the CAM cell 103′.

After that, through the search bit line, bits of the search data storedin the comparison register 101 are sent to the CAM cells 103′. By thisoperation, according to the search data S, either one of the search bitline pair SL and SLN will be ‘H’ and the other will be ‘L’.

When the data D stored in the memory cell 104 matches to the search dataS, the level of the node 109 will be ‘H’, and the state of nMOS 108 iskept ON.

On the other hand, when data D and search data S do not match, the levelof the node 109 will be ‘L’, and the nMOS 108 will be in the OFF state.

After all the states of the CAM cells 103′ of m-bit of the CAM word isdetermined, from the one end of the match line ML, the discharge starts,and the other end of the line, determine the comparison result. In thiscase, when there exist any mismatch CAM cell 103′, the comparison resultwill be ‘H’, that is, kept in the mismatch state. On the other hand,only when the matches are detected in all the CAM cells 103′, thecomparison result will be ‘L’, that is, the match state.

By the way, when the circuit is used as a TCAM, the pair of search bitlines SL and SLN must be in the ‘H’(=‘1’) state for the don't care bits.

REFERENCE

[Patent Literature 1]

-   Japanese Unexamined Patent Application Publication No. 2004-295967    [Patent Literature 2]-   Japanese Unexamined Patent Application Publication No. 2003-389264    [Patent Literature 3]-   Japanese Unexamined Patent Application Publication No. 2004-258799    [Patent Literature 4]-   Japanese Unexamined Patent Application Publication No. 2004-258799    [Non-Patent Literature 1]-   S. Kohyama (ed.), Very High Speed MOS Device: Very high speed device    Series, The first edition, Baifu-Kan Pub. Co., February 1986, pp.    324-325 (in Japanese).    [Non-Patent Literature 2]-   IEICE, LSI Handbook, The first edition, Ohm-Sha Pub. Co., November    1994, pp. 523-525 (in Japanese).    [Non-Patent Literature 3]-   Kostas Pagiamtzis and Ali Sheikholeslami, “A Low-power    content-addressable memory (CAM) using pipelined hierarchical search    scheme”, IEEE Journal of Solid-State Circuits, Vol. 39, No. 9,    September 2004, pp. 1512-1519.    [Non-Patent Literature 4]-   T. Sasao, M. Matsuura, and Y. Iguchi, “A cascade realization of    multi-output function for reconfigurable hardware”, International    Workshop on Logic and Synthesis (IWLS01), Lake Tahoe, Calif., Jun.    12-15, 2001, pp. 225-230.    [Non-Patent Literature 5]-   T. Sasao and M. Matsuura, “BDD representation for incompletely    specified multiple-output logic functions and its applications to    functional decomposition,” Design Automation Conference, June 2005,    pp. 373-378.    [Non-Patent Literature 6]-   Y. Iguchi and T. Sasao “On the LUT cascade architecture,” The 2003    National Convention of IEE Japan, Electronics, Information, and    System Group, MC2-4, Aug. 29, 2003, Akita University, Japan (in    Japanese).

DISCLOSURE OF THE INVENTION Problems to be Solved by the Invention

The above-mentioned conventional CAM is faster than RAM, since it cansearch data in parallel. However, the realization of the device is morecomplex. Thus, the price per bit (bit cost) is 10˜30 times higher thanRAM.

Also, the power dissipation of the CAM is much larger than that of RAM,per bit (See Non-patent literature 3). This is because, in the CAM, allthe cells are accessed at the same time, as explained before. Thus, thepower dissipation per bit can be up to 50 times of an ordinary RAM.

Thus, the purpose of this invention is to provide the low powerassociative memory requiring small implementation area by using simpledevice structure.

Means for Solving the Problem

From here, we will show the definitions and theories necessary tounderstand this invention, and then, we will explain the method torealize the function in this invention.

[1] Property of CAM Functions

[Definition 3] (Decomposition Chart, Standard Decomposition Chart,Column Multiplicity)

Assume that the function ƒ(X): B^(n)→B^(q), and X=(x₁, x₂, . . . ,x_(n)) are given. Where, B={0, 1}. Let (X_(L), X_(H)) be a partition ofX. The “decomposition chart” of ƒ is, a two-dimensional matrix, wherethe columns labels denote all possible assignment of elements of B toX_(L), and the rows labels denote all possible assignment of elements ofB to X_(H). And the corresponding matrix value is equal to ƒ(X_(L),X_(H)).

Among the decomposition charts for the function ƒ, the one whereX_(L)=(x₁, x₂, . . . , x_(nL)), and X_(H)=(x_(nL+1), x_(nL+2), . . . ,x_(n)) is the “standard decomposition chart”.

The number of different column patterns in the decomposition chart isthe “column multiplicity”.

Note that, as a special case of the decomposition chart, we alsoconsider the case of X_(L)=X.

(End of Definition)

[Definition 4] (C-Measure)

Let ƒ be a logic function, and let (x₁, x₂, . . . , x_(n)) be the orderof variables. Then, the maximum Value of the column multiplicity of thestandard decomposition chart for ƒ is the “C-measure” of ƒ.

(End of Definition)

EXAMPLE 3

The C-measure of the functionƒ₁=x₁x₂

x₃x₄

x₅x₆three, while the C-measure of the functionƒ₂=x₁x₅

x₂x₆

x₃x₄is eight.

(End of Example)

The column multiplicity of the decomposition chart for ƒ is equal to thewidth of the MTBDD (multi terminal binary decision diagram). Thus, theC-measure of a logic function ƒ is equal to the maximum width of theMTBDD with the given ordering of the input variables. For a given logicfunction ƒ(x₁, x₂, . . . , x_(n)), the C-measure can be easily computed,and is unique. To be explained later, the function with a smallC-measure can be efficiently implemented by an LUT (Lookup table)cascade. Thus, the C-measure is a measure showing the complexity of thelogic function realized by an LUT cascade.

[Lemma 1]

For a given function ƒ, let p be the number of the input combinationsthat produce non-zero outputs. Then, the C-measure of ƒ is at most p+1.

(End of Lemma)

[Theorem 1] (C-Measure of a BCAM Function)

Assume that the BCAM table is given, where p is the number of productsin the table. Then, the C-measure of the BCAM function is at most p+1.

(End of Theorem)

[Theorem 2] (C-Measure of a TCAM Function)

Assume that the TCAM table is given, where p is the number of productsin the table, and each vector has at most k don't cares. Then, theC-measure of the corresponding TCAM function is at most 2^(k)p+1.

(End of Theorem)

[2] LUT Cascade

A CAM function can also be realized by an ordinary RAM. For example, theabove-mentioned BCAM function shown in Table 1 having 7 elements, can berealized by a RAM with 16 words as shown in Table 2. Where, each wordconsists of 3 bits. When we realize an n-input CAM function by a singleRAM, even if the BCAM contains only a few vectors, the size of the RAMis proportional to 2^(n). However, by using an LUT cascade, we candrastically reduce the size of memory (See Patent literature 3).

[Theorem 3]

For a given function ƒ, let X_(L) correspond to the row variables andlet X_(H) correspond to the column variables of the decomposition chart,and let μ be the column multiplicity of the decomposition chart. Then,the function ƒcan be realized by the network shown in FIG. 1. In thiscase, the number of signal lines that connect two blocks H and G(hereafter, it is denoted by the “number of rails”) is at most

[Expression 2]┌log₂μ┐  (2)

(End of Theorem)

When the number of signal lines that connect two blocks is smaller thanthe number of variables in X_(L), there is a chance to reduce the totalamount of memory to realize the function. Such method is a “functionaldecomposition”. By recursively decomposing the given function, we havean LUT cascade shown in FIG. 2 (See Non-patent literature 4). An LUTcascade consists of “cells”, and signal lines connecting adjacent cellsare “rails”. A logic function with a small C-measure can be implementedby a compact LUT cascade. To obtain the C-measure, we need not use adecomposition chart, but we can efficiently compute it from the binarydecision diagram that represents the characteristic function of themultiple-output function (“BDD_for_CF”) (See Patent literature 2, andNon-patent literature 5).

[Theorem 4]

A logic function with C-measure μ can be realized by an LUT cascadeusing cells having at most

[Expression 3]┌log₂μ┐+1  (3)inputs and

[Expression 4]┌log₂μ┐  (4)outputs.

(End of Theorem)

[Theorem 5]

Consider a function ƒ. Let n be the number of input variables, s be thenumber of cells, r be the maximum number of rails (i.e., the number ofsignal lines between cells), k be the maximum number of inputs to thecell, and μ be the C-measure of the function ƒ. Then, there exist an LUTcascade having the following relation:

[Expression 5]k>r, r=┌log₂μ┐  (5)and

$\begin{matrix}\left\lbrack {{Expression}\mspace{14mu} 6} \right\rbrack & \; \\{s \leq {\left\lceil \frac{n - r}{k - r} \right\rceil.}} & (6)\end{matrix}$

(End of Theorem)

[3] A Design Method Using Don't Cares.

[3-1] Design Method for a BCAM Function

In a BCAM function, we assume that the number of non-zero outputs in thetruth table is much smaller than the total number of the combinations2^(n). In other words, the following (Assumption 1) holds.

(Assumption 1)

Let n be the number of input bits in the BCAM table and let p be thenumber of vectors, then p<<2^(n).

For example, consider the BCAM with n=32 and p=1000, the ratio ofnon-zero outputs to the number of all possible minterms is1000/2³²=2.3×10⁻⁷.

When a BCAM function is represented by a BDD, the maximum width of theBDD is not greater than the C-measure of the function. From Theorem 1,we can see that the number of the rails is at most p+1. However, in themost levels, the with of the BDD is near to p+1. Therefore, when theBCAM function is implemented by an LUT cascade, we need many cells whosenumber inputs are

[Expression 7]┌log₂(p+1)┐+1  (7)

From here, we are going to explain the reduction method of the hardwareto implement the BCAM function, with the concept of don't care, usingFIG. 3.

[Algorithm 1]

(1) Let ƒ be a BCAM function. In ƒ, for the unregistered vectors in theCAM, replace the all the outputs by don't cares, to obtain g.

(2) Generate the binary decision diagram representing the characteristicfunction for g, (BDD_for_CF), and simplify it.

(3) From the simplified BDD, generate LUT cascade 1. In general, LUTcascade 1, is simpler than the LUT cascade realizing ƒ (called “exactLUT cascade”).

(4) When the search data match to the registered data, LUT cascade 1produces the correct value. When the search data do not match to any ofthe registered data, LUT cascade 1 may produce an incorrect value.

(5) To fix the error, we use the auxiliary memory 2 with m inputs and noutputs, where m is given by:

[Expression 8]m=┌log₂(p+1)┐  (8)

The auxiliary memory 2 stores the corresponding data of the BCAM tablein each address.

(6) LUT cascade 1 produces index as the output, and send it to theauxiliary memory 2. And the auxiliary memory 2 produces thecorresponding registered data. The comparator 3, compares the input datawith the outputs of the auxiliary memory 2. If they are the same, thenthe output value of LUT cascade 1 is guaranteed to be correct. In thiscase, encoder transfer the index produced by LUT cascade 1 to theexternal output. On the other hand, if the output data of the auxiliarymemory 2 is different from the input data, then there is no vector inthe CAM. In this case, encoder 4 produce the index that denotes the‘invalid’ (0).

(End of Algorithm)

The total number of bits in the auxiliary memory 2 is n2^(m), and thecost of the hardware is negligibly smaller than that of LUT cascade 1.

[3-2] Design Method for TCAM Function

In the case of TCAM function, the entries in the TCAM table are ternary,and auxiliary memory 2 has m inputs and 2n outputs. Also, in thecomparator, the bits that correspond to don't care are ignored. Forother parts, the operations are the same as the BCAM.

[4] Constitution of Present Invention

With the constitution of the address generator according to the presentinvention, an associative memory that produces a unique indexcorresponding to an input data, comprising:

a simplified functional processing unit (SFPU), implemented by an LUTlogic network or by a PLA (Programmable Logic Array), that implements afunction g (hereinafter referred to as “simplified CAM (ContentAddressable Memory) function”), where g is the function derived from ƒby replacing the value showing “invalid” with the don't care, and ƒ(hereinafter referred to as “CAM function”) is a function that producesthe unique index for a given input data;

an auxiliary memory that stores the inverse function ƒ⁻¹ of said CAMfunction ƒ; and

a output modifier that checks whether the output value of said SFPU isequal to the output value of the CAM function ƒ; wherein

said SFPU produces the operational value (hereinafter referred to as“tentative index value”) for said simplified CAM function g as theaddress for said auxiliary memory;

said auxiliary memory produces the value of the inverse function ƒ⁻¹when the tentative index value is applied to the read address;

said output modifier compares the input data with said value of theinverse function ƒ⁻¹ produced by the auxiliary memory, and produces theoutput of said SFPU if they are the same, otherwise produces the signalshowing the “invalid” when they are different.

With this constitution, an LUT logic network can be implemented byplural ordinary RAMs. Also, the auxiliary memory can be implemented byan ordinary RAM. Also, instead of realizing the original CAM function ƒ,the reduced function g, which can be obtained from ƒ by replacing thevalues showing the “invalid” with don't cares, can be implemented by LUTnetwork or by a PLA to reduce the necessary amount of memory. Thus, as awhole, we can implement a CAM function in a smaller area than theconventional CAM, by using a simpler device structure.

Also, by using an ordinary RAM to implement the CAM function, the usethe special CAM circuit can be avoided. Thus, in addition to implementby an ASIC, the CAM function can be easily implemented by programmablelogic device such as FPGAs (Field Programmable Gate Arrays) or CPLDs(Complex Programmable Logic Devices) embedding general-purpose RAMs, inlower cost.

Also, except for the means for decision, the circuit can be implementedby ordinary RAMs.

And, for a single search operation, a value of a tentative index can beobtained by a several RAM access (the number of RAM access in an LUTnetwork, plus one). In a each RAM access, only one address is accessedin a RAM. As a whole, the value of the tentative index can be obtainedby several RAM access. Therefore, the power dissipation can be greatlyreduced compared with a conventional CAM.

On the other hand, as for the speed, it is slower than the conventionalCAM. But, it is much faster than the ordinary method that searches theRAM using a CPU.

In this case, the “LUT logic network” denotes the circuit that isobtained by LUTs (Look-Up Tables) in a cascade structure, or in anetwork structure. However, it does not limited to the circuitconsisting of LUTs that are physically placed and interconnected. Forexample, the LUT network can be implemented by a sequential network,wherein a single memory contains plural LUTs, and selected LUTs aresequentially changed, and the output value of the memory (LUT) isfeed-back to the input of the memory as the read address.

Note that this invention can be applied to implement both BCAM and TCAM.

Effect of the Invention

As shown above, this invention provides an associative memory with ahigh-speed search capability, and low-power dissipation, requiring smallimplementation area using simple device structure.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows the functional decomposition of a logic function.

FIG. 2 shows the LUT cascade with the intermediate outputs.

FIG. 3 explains the realization method of a CAM function with don'tcares.

FIG. 4 shows the whole realization of the associative memory ofEmbodiment 1 of this invention.

FIG. 5 shows a realization of reduced function operation unit 5 in FIG.4.

FIG. 6 shows a realization of a output modifier 7 in FIG. 4.

FIG. 7 shows the realization of a means for the coincidence decision 7in Embodiment 2 of the associative memory.

FIG. 8 is a block diagram of an example of a basic realization of aconventional CAM.

FIG. 9 is a diagram showing a realization of a circuit for a CAM cell inFIG. 8.

FIG. 10 is an another example of a realization of circuit for a CAM cellin FIG. 8.

EXPLANATION OF REFERENCES

-   -   1: LUT cascade    -   2: Auxiliary memory    -   3: Comparator    -   4: Encoder    -   5: Reduced function operation unit    -   6: Auxiliary memory    -   7: Output modifier    -   10: Associative memory    -   11: Input variable register    -   12-1˜12-s: Memory for logic    -   13: Output variables register    -   21: EXNOR gate    -   22: AND gate    -   23: AND gate    -   31: pq element

BEST MODE FOR CARRYING OUT THE INVENTION

From here, we are going to explain the best form for carrying out theinvention, by using the drawing.

(Embodiment 1)

FIG. 4 shows the whole realization of an associative memory concerningEmbodiment 1 in this invention. In this embodiment, the associativememory 10, consists of the reduced function operation unit 5, theauxiliary memory 6, and the means for decision of the coincidence 7.

The associative memory 10, produces the unique index A=(a₁, . . . ,a_(m)) for an n-bit input data X=(x₁, . . . , x_(n)) that are suppliedfrom the external network. The input data X is applied to the reducedfunction operation unit 5 and the output modifier 7.

From here, F denotes the CAM function that produces the correspondingunique index A for an input data X. Also, G denotes the simplified CAMfunction that is obtained from the CAM function F, by replacing thevalues that show the ‘invalid’ with don't cares.

The reduced functional operation unit 5 consists of the LUT network thatimplements a simplified CAM function G. The reduced functional operationunit 5, produces an tentative index A′=G(X) for an input data X. Thetentative index A′ is sent to the auxiliary memory 6, and the outputmodifier 7.

The auxiliary memory 6 stores the inverse function F⁻¹ of the CAMfunction F. That is, an LUT stores the function producing thecorresponding data X for the unique index A. The auxiliary memory 6,receives the tentative index A′ that is produced by the reducedfunctional operation unit 5, and produces the inverse data X′=F⁻¹(A′).The tentative index A′ is send to the means to decide coincidence 7.

The means to decide the coincidence 7 compares the input data X with theinverse data X′ to check the equality, and decides whether they are thesame. When they are the same it produces the tentative index A′ as theoutput index A, while when they are different, it produces the valueshowing ‘invalid’.

FIG. 5 shows the realization of simplified functional operation unit 5of FIG. 4. In Embodiment 1, as the simplified functional operation unit5, we used an LUT cascade circuit.

The simplified functional operation unit 5, consists of the inputvariable register 11, the memories for logic 12-1˜12-s, and the outputvariables register 13.

The input variable register 11 temporarily stores the input data X givenfrom the external inputs. In the memories for logic 12-1˜12-s store thesubfunctions G₁, . . . , G_(s) are stored as LUTs. When X=(X₁, X₂, . . ., X_(s)) is a partition of X, the subfunctions G₁, . . . , G_(s) aregiven as follows:

$\begin{matrix}\left\lbrack {{Expression}\mspace{14mu} 9} \right\rbrack & \; \\{{\left( {A_{1}^{\prime},R_{1}} \right) = {G_{1}\left( X_{1} \right)}}{\left( {A_{2}^{\prime},R_{2}} \right) = {G_{2}\left( {X_{2},R_{1}} \right)}}\mspace{110mu}{{\vdots\left( {A_{s - 1}^{\prime},R_{s - 1}} \right)} = {G_{s - 1}\left( {X_{s - 1},R_{s - 2}} \right)}}{A_{s}^{\prime} = {G_{s}\left( {X_{s},R_{s - 1}} \right)}}{A^{\prime} = \left( {A_{1}^{\prime},A_{2}^{\prime},\ldots\mspace{14mu},A_{s}^{\prime}} \right)}} & (9)\end{matrix}$

Here, vectors A′₁, A′₂, . . . , A′_(s−1) denote the intermediate outputsof the LUT cascade, and the vector A′_(s) denote the external output ofthe LUT cascade. The tentative index A′ is represented as a function ofthese vectors. Also, the vectors R₁, R₂, . . . , R_(s−1) represent theintermediate variables of the LUT cascade.

The output variables register 13 stores the intermediate outputs A′₁,A′₂, . . . , A′_(s−1) and the final output A′_(s), produced by thememories for logic 12-1˜12-s, and produces the tentative index A′.

Note that, the detailed operation of an LUT cascade is shown in thepatent literature 4, and a non-patent literature 6. Thus, we omit theexplanation.

FIG. 6 shows realization of the output modifier 7 shown in FIG. 4. FIG.6 shows an example, where the input data X has 8 bits and the outputindex A has 8 bits. However, the numbers of bits for the input data Xand the output index A need not be 8.

The output modifier 7 consists of the EXNOR gate 2, where each inputcorresponds to each bit of the input data X, and the AND gate 23, whereeach input corresponds to each bit of the output index A.

Each bit of the input data X and the inverse data X′, i.e., x_(i),x′_(i)(i=1, 2, . . . , 8) are connected to the corresponding EXNOR gate21 to perform the EXNOR operation. The operation results of the EXNORgates 21 is sent to the AND gates 22 to perform the AND operations. Theoutput Q of the AND gate 22 is called the coincidence decision signal.

On the other hand, each bit of the tentative index A′, a′₁, a′₂, . . . ,a′₈ is applied to an input terminal of the AND gate 23. Also, to theother input terminal of the AND gate 23, the coincidence decision signalQ is applied.

From here, we will explain the operation of the associative memory 10 ofthis embodiment realized as shown above.

First, when the input data X is applied from the external network, thereduced functional operation unit 5 performs the operation of the CAMfunction G(X), and produces the result as the tentative index A′. If theregistered vector exist for the input data X, then A′, the value of thetentative index, is the correct index value. On the other hand, if noregistered vector exists for the input data X, then the value of thetentative index A′ may be incorrect. Thus, although the tentative indexA′ may be correct, we cannot see whether it represent the correct indexor not.

Next, the tentative index A′ is applied to the auxiliary memory 6. Theauxiliary memory 6 performs the LUT operation of the output functionF⁻¹(A′) to the tentative index A′, and produces the inverse dataX′=F⁻¹(A′) as the output. Note that, if the tentative index A′ is thecorrect index, then the inverse data X′ is equal to the input data X.However, when the tentative index A′ is produced by the don't careassignment, the tentative index A′ may be an incorrect value, and theinverse data X′ denotes the ‘invalid value’.

Next, to the i-th EXNOR gate 21 (i=1, 2, . . . , 8) of the outputmodifier 7, x_(i), the i-th component of the input data X, and the i-thcomponent of the inverse data X′ are applied. The EXNOR gate 21 performsthe logic operation:

[Expression 10]q_(i)= x_(i)⊕x′_(i)   (10)

to produce the value q_(i). The operational value q_(i) is 1 if x_(i)equals x′_(i), and 0 if they are different.

Each operation q_(i) is applied to the AND gate 22 to perform the ANDoperation. From this, when each component of the input data X and theinverse data X′ are the exactly the same, the coincidence decisionsignal Q becomes 1, and for the other case, the coincidence decisionsignal Q becomes 0.

Therefore, by using the coincidence decision signal Q, we can checkwhether the tentative index A′ shows the correct index value or not.

To the i-th AND gate 23 (i=1, 2, . . . , 8), a′_(i), i.e., the i-thcomponent of the tentative index A′ and the coincidence decision signalQ are applied. Next, each AND gate 23 performs the AND operation toproduce a_(i) as the i-th component of the output index A. From this,when the tentative index A′ show the correct index value, the outputindex A equal to the correct index value, and for other case, 0 isproduced as the output index A.

In this way, the CAM function operation is performed for the input dataX. As explained above, in this embodiment, the reduced functionaloperation unit 5 and the auxiliary memory 6 are implemented by memories,and we can use ordinary RAMs for them. Therefore, by the refinement ofthe LSI process, we can reduce the size of the network size. Also, sincethey are memories, the power dissipation can be reduced by making theminto the low-power state when they are idle.

(Embodiment 2)

Whole realizations of the associative memory and the realization of theoutput modifier 7 concerning to Embodiment 2 are similar to FIG. 4 andFIG. 6, respectively. So, we omit the explanation. In the associativememory in Embodiment 2, the realization of the reduced functionaloperation unit 5 is different from that of Embodiment 1.

FIG. 7 shows the realization of the reduced functional operation unit 5of the associative memory in Embodiment 2. In this embodiment, thereduced functional operation unit 5 consists a network realized byplural pq elements 31. Note that a pq element 31 is a p input q outputmemory. For each pq element 31, the values of p and q may be chosen foreach pq element 31. Each pq element 31 stores one of subfunctions G₁,G₂, . . . that is obtained by decomposing the simplified CAM function G,as an LUT. The network that are obtained by connecting plural pqelements 31 to form a network structure is called “pq network”.

Note that, the realization shown in FIG. 7 is an example, and the methodof interconnection of pq elements 31 may be changed for the given CAMfunction G.

In this way, we can realize an associative memory, similarly toEmbodiment 1, even if we use pq-network to implement the simplifiedfunctional operation unit 5.

1. An associative memory that produces a unique index corresponding toan input data, comprising: a simplified functional processing unit(SFPU), implemented by an LUT logic network or by a PLA (ProgrammableLogic Array), that implements a function g (hereinafter referred to as“simplified CAM (Content Addressable Memory) function”), where g is afunction derived from ƒ by replacing a value showing “invalid” with thedon't care, and ƒ (hereinafter referred to as “CAM function”) is afunction that produces the unique index for the input data; an auxiliarymemory that stores an inverse function ƒ⁻¹ of said CAM function ƒ; and aoutput modifier that checks whether an output value of said SFPU isequal to an output value of the CAM function; wherein said SFPU producesan operational value (hereinafter referred to as “tentative indexvalue”) for said simplified CAM function as an address for saidauxiliary memory; said auxiliary memory produces a value of the inversefunction ƒ⁻¹ when the tentative index value is applied to a readaddress; said output modifier compares the input data with said value ofthe inverse function ƒ⁻¹ produced by the auxiliary memory, and producesthe output value of said SFPU if they are the same, otherwise produces asignal showing the “invalid” when they are different.
 2. The associativememory as claimed in claim 1, wherein said LUT logic network is an LUTcascade logic circuit.
 3. The associative memory as claimed in claim 1,wherein said LUT logic network is a pq network.